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Abstract 

The Generalized Traveling Salesman Problem (GTSP) is a well-known combinatorial optimization prob- 
lem with a host of applications. It is an extension of the Traveling Salesman Problem (TSP) where the 
set of cities is partitioned into so-called clusters, and the salesman has to visit every cluster exactly once. 

While the GTSP is a very important combinatorial optimization problem and is well studied in many 
aspects, the local search algorithms used in the literature are mostly basic adaptations of simple TSP 
heuristics. Hence, a thorough and deep research of the neighborhoods and local search algorithms specific 
to the GTSP is required. 

We formalize the procedure of adaptation of a TSP neighborhood for the GTSP and classify all other 
existing and some new GTSP neighborhoods. For every neighborhood, we provide efficient exploration 
algorithms that are often significantly faster than the ones known from the literature. Finally, we compare 
different local search implementations empirically. 

Keywords: Heuristics, Local Search, Neighborhood, Generalized Traveling Salesman Problem, 
Combinatorial Optimization. 



1. Introduction 

The Generalized Traveling Salesman Problem (GTSP) is an extension of the Traveling Salesman 
Problem (TSP). In the GTSP, we are given a set V oi n vertices, weights w{x, y) of going from a: G y to 
y eV and partition of V into clusters Ci , C2 , . . . , C„j . A feasible solution, or a tour, is a cycle visiting 
exactly one vertex in every cluster. The objective is to find the shortest tour. 

If the weight matrix is symmetric, i.e., w{x,y) = w{y,x) for any x,y £ V, the problem is called 
symmetric. Otherwise it is an asymmetric GTSP. 

Observe that the TSP is a special case of the GTSP when |Ci| = 1 for each i and, hence, the GTSP 
is NP-hard. 

The GTSP has a host of applications: warehouse order picking with multiple stock locations, sequenc- 
ing c omputer files, postal routing, airport selection and routing for courier p lanes, and some others, see, 

e.g., (JFischetti et al.l . Il995i 119971 : iLaporte et al.l Il996t iNoon and Beaij^ 1991 ) and references therein. 

M uch attention was paid to solving the GTSP. Several researchers (JBen-Arieh et al.l . 120031 : iLaporte and Semet 

I999I : iNoon and Beanl . Il993[ ) proposed transformations of a GTSP instance into a TSP instance. At first 
glance, the idea of transforming a little-studied problem into a well-known one seems to be promising. 
However, this approach has a very limited application. Indeed, it requires exact solutions of the obtained 
TSP instances because even a near-optimal solution of such TSP may correspond to an infeasible GTSP 
solution. At the same time, the produced TSP instances have a rather unusual structure which is hard for 
the existing TSP solv ers. A more efficient a pproach to solve the GTSP exactly is the branch-and-bound 
algorithm designed bv lFischetti et al.l ( 19971 ). By using this algorithm, the authors solve several instances 
of size up to 89 clusters; solving larger instances to optimality is still too hard nowadays. Two approx- 
imation algorithms for special cases of the GTSP were proposed in the literature; alas, the guaranteed 



'Corresponding author 
Email addresses: daniel.karapetyanagmail.com (D. Karapotyan), gut inScs. rhul.ac.uk (G. Gutin) 



Preprint submitted to Elsevier 



(JBontoux et all . 120101 ) and references 



solution quality is rather low for the real-world applications, see 
therein. 

In order to obtain good (but not necessarily exact) solutions for larger GTSP instances, o ne should con- 
sider heuristic approach. Several construction heuristics and local searches were discussed in (IBontoux et al 



2010[ iGutin and Karapetvani l2010t IHu and Raidl . 120081: iRenaud and Boctoij. [l995 ISnvder and Daskinl 

2006 ) a nd some others. A n umber of metaheuri s tics were proposed b y Bontoux et al. (2010); Gutin and Karapetvani 
( 2010|) : lGu"tin et al.l(l2008l):lHuang et al.l(l20Q5l):lPintea et aLl(|2007l) : iSilberholz and Golden (.2007.) : .Snvder and Daskin 
(|2006l ): iTasgetiren et al.l (|2007l ) : lYang et al.l (|2008l) . However, none of these studies provides a review of 
GTSP neighborhoods or discusses in detail different local search algorithms. Since most of the solution 
methods applied to GTSP are somehow based on local search, we believe that a deeper understanding of 
this subject is of great importance. 

In this paper, we define and analyze all known and some new GTSP neighborhoods and the corre- 
sponding exploration algorithms. We consider only the classical local search which guarantees to find 
a local minimum within a certain neighborhood . Note that several GTSP neighborhoods were used in 
(iGutin and Ka rapetvail bOloHGutin et all . [200I: ISnvder an d Daskinl. [200I ISilberholz and Goldenl . 12007 : 
Tasgetiren et al. . 2007) . but they were not systematized or analyzed in detail. We aim to classify all known 
and new neighborhoods and provide efficient exploration algorithms for all of them. Note that many of 
the neighborhoods discussed below are already known from the literature but, because their exploration 
algorithms were rather slow, some of them were considered practically useless. Our improvements, of 
both heuristic and theoretical nature, dramatically speed up the exploration algorithms, making the 
corresponding neighborhoods of practical interest. 

In our classification, we divide all the GTSP neighborhoods into three classes: 

1. Cluster Optimization neighborhoods consist of solutions which differ from the original one in vertex 
selection but have the same cluster order. This class is discussed in Section [2| 

2. TSP-inspired neighborhoods are GTSP neighborhoods derived from TSP neighborhoods. Such 
neighborhoods normally consist of solutions obtained from the original one by some global rear- 
rangements of the cluster order. The vertex selection within clusters may or may not be preserved 
in these solutions. In Section 13. 2[ we show that there exist several ways to adapt an arbitrary 
TSP neighborhood to the GTSP and propose a number of ways to make the exploration of these 
adaptations efficient. 

3. Fragment Optimization neighborhoods consist of solutions which are different from the original one 
in some small tour fragment. Neighborhoods of this type were not widely used before. In Section|4l 
we propose two efficient algorithms for exploration of such neighborhoods. 

N ote that there exists another c lass of very successful local searches based on the Lin-Kernighan 
idea 1 Karapet van and Gutinl . |2011a), but they are not discusses in this paper because they are not 
'neighborhood-based. ' 

In this paper we use the following notation: 

• n is the number of vertices in the graph. 

• TO is the number of clusters. 

• s is the maximum cluster size. Obviously, [n/777,] <s<n — m + 1. 

• 7 is the minimum cluster size. Obviously, 1 < 7 < \n/m\ . 

• Cluster{x) is the cluster containing vertex x. 

• w{vi,V2) is the weight of edge (wi,U2). 

• w{vi,V2,...,Vk) = w{vi,V2) +'W{V2,V3) + ... +w{vk-l,Vk). 

• Wmin{Xi,X2, . . . , Xk) — min ^(a;!, X2, . ■ . , Xk), where Xi G Xi and Xi is a set of vertices, i = 

Xl,X2,...,Xk 

1,2, ... ,k. Function Wina.x{Xi,X2, . . . , Xk) is defined similarly. 



• Ti denotes the vertex at the ith position in tour T . We assume that Ti+m = Ti. 

• Tour T is also considered as a set of its edges, i.e., T = {(ri,r2), {T2,T3), ..., (T™_i,T„), (r,„,Ti)}. 

• Turn{T , X , y) denotes the tour obtained from T by reversing the fragment T^+i, Tx+2^ ■ • ■ : T.y: 

TurniT,x,y) = Ti, . . . , T^;, Tj^, Tj^_i, . . . , T3.+1, Ty+i, . . . , T^, Ti . 

Reversed 

Observe that for a symmetric GTSP 

Turn{T,x,y) ^T\{{T^,T^+i), {Ty,Ty+i)} U {{T^,Ty), {T^+i,Ty+i)} 
and, hence, the weight of the obtained tour can be calculated in time 0(1): 

w{Turn{T,x,y)) = w(r) - ^(^^,^^+1) ~w{Ty,Ty+i) + w{Tx,Ty) + w{Tx+i,Ty+i) . (1) 

1.1. Experiments Prerequisites 

Although this paper does not suggest the 'best' GTSP local search, as a result of extensive computa- 
tional experiments, we select the most efficient exploration algorithms and compare different neighborhood 
variations. In this section we discuss details of our experimentation techn iques. 



Our test bed includes several TSP instances taken from TSPLIB (|ReineltJ . 1 199 11 ) and converted 
to th e GTSP by the standard clustering procedure of Fischetti, Salaz ar, and Toth (IFischetti et al. . 
19971): the same approach i s widely used in the literature, see, e.g., (IGutin and KarapetvanL I201C ; 



Silberholz and Goldenl . l2007t ISnyder and Daskin .'2006: Tasget iren et al. . 200711. In p articular, we use all 

the in stances with 10<m<217 like in ( Bontoux et al. . 2010: Gutin and Karapetvanli2010: Silberholz and Goldenl . 
20071 ): in other papers the bounds are more restrictive. However, to save space, we usually include only 
every fifth instance in our tables. 

Every instance name in the testbed consist s of three parts : 'tti t n\ where m is the number of clusters, 

t is the type of the original TSP instance (see (JReineltl . llQQll ) for details) and n is the number of vertices. 

Observe that the optimal solutions are known only for some instance s with at most 89 clusters ( Fischetti et al.L 

1997 ) . For the rest of the instances we use the best known solutions, see ( Bontoux et al.l , l2010l : lGutin and Karapetvan , 
20ld : ISilberholz and GoldmL l2007l) . 

In ord er to g enerate the starting tour for the local search procedures, we use a simplified Nearest 
Neighbor ( Noonl . 1 1988) construction heuristic. Unlike the algorithm proposed by Noon, our implemen- 
tation tries only one starting vertex. According to our experiments, trying every vertex as the starting 
point significantly slows down the heuristic and almost does not influence the quality of solutions obtained 
after applying local search. Note that in what follows, the running time of a local search includes the 
running time of the construction heuristic. 

All the algorithms are implemented in Visual C-}-+; the evaluation platform is based on an Intel Core 17 
2.67 GHz processor. 

1.2. Local Search Strategy 

Most commonly, one uses the first improvement local search strategy, i.e., applies an improvement 
as soon as it is found. Alternatively, one can use the best improvement strategy which first explores 
the whole neighborhood and then applies the best found improvement. Note that the first improvement 
strategy is normally faster while the best improvement strategy gives better solution quality. 

We implemented and tested both strategies for most of the algorithms discussed below. Our exper- 
iments show that the difference in solution quality between these two strategies is negligible while the 
running time of the best improvement is significantly higher. In what follows, we use the first improvement 
strategy. 



2. Cluster Optimization 

In this section we discuss GTSP neiglibortiood structures preserving tire order of clusters in the tour. 
Virtually, the smallest neighborhood of T of this type is 



7VL(T,*) = {(Ti,T2,...,r,_i,7^',r, 



i+li 



,T,„,Ti): 7^'g ClusteriT,)} . 



Its size is \Ni^{T^i)\ = \Cluster{Ti)\ and it takes 0{s) operations to explore it. One can extend it for two 
or more clusters: Ni^{T, I), where / is a set of cluster indices to be varied. The size of such a neighborhood 
is\Nl^{TJ)\^ll^eI\Clnster{T,)\. 

Observe that it takes only 0(|/|s) operations to explore Ni^{T,I) if all the clusters selected in / are 
'independent', i.e., there is no i such that i € I and i + 1 G /. li I — {i,i + I}, the neighborhood 
Ni^{T,I) changes its structure. Now it takes 0{s'^) operations to explore it. One may assume that for 
I ^ {i,i + 1, . . . ,i + k — 1} the time complexity of the local search is 0(5*^). Next we will show that, in 
fact, it takes only 0{ks^) operations to find the best solution in such a neighborhood. 

Let (Ti, T2, . . . , T,„, Ti) be a tour and / = {i, i + 1, . . . , i + fc — 1}, where k < m. Let 7j — Cluster{Tj). 
Construct a layered network as shown in Figure [TJ Find the shortest path from Ti_i to T^+fc in this 




T._i 



7- 



Ti+i 



Ti+k-i 



f-i+k 



Figure 1: In order to get the best tour in Ni,{T, {i,i + 1, . . . ,i + k — 1}), construct a layered network as 
shown here (all the weights in this network correspond to the original weights in the GTSP instance) and 
find the shortest path from Ti_i to Ti+k- 



network and update the vertices in the tour accordingly. This will yield the shortest tour T' e Ni^{T, {i, i+ 
1, . . . ,i + k — 1}), and the time complexity of this algorithm is 0{ks'^). 

Consider the case where k = m. This is the largest neighborhood of this type and we denote it 
Nco{T) — Ni^{T, {1,2, ... ,in}). Since Nco{T) does not fix any vertices, it is now impossible to use 
straightforwardly the optimization technique shown above. However, the problem of finding the short- 
est tour T' e Nco{T) can be brought to several problems of finding the shortest tour in Ni^{T, {2, . . . ,to}). 
For every v e Cluster{Ti) find the shortest tour T'" e Ni,{T\ {2, 3, ... , to}), where T" = {v, T2, Tg, . . . , r,„). 
The shortest tour among T'^ is the shortest tour T' G Nco{T). The procedure takes 0{ms^) operations. 
In what follows, we call t his algorithm C l uster Optimization (CO). 

CO was in troduced by Fischetti et al. (119971) (for detailed description see a l so (IFischetti et al.U2002l)) 
and used in ('Cutin and KarapetyaiJ, l2010|; iGutineLalJ, I2OO8I : IHu and Raidll . l2008t IPintea et all . 12007 : 
Renaud and Boctor . 1998 ) and others. 

A formal implementation of CO is presented in Algorithm [TJ Note that Nco{T') — Nco{T) for any 
T' G Nco{T) and, thus, unlike usual local search procedures, CO does not need to be run several times 
to get the local minimum. 



2.1. Cluster Optimization Refinements 

In this section we discuss several improvements that can noticeably reduce the running time of CO. 



Algorithm 1 Cluster Optimization. Basic implementation. 



Require: Tom T = (Ti, T2, . . . , T„). 
Let Ti = Cluster [Ti) for every i. 
for all w e 7i and r € T2 do 

Initialize the shortest path from v to r: py^r '^ ('^i'')- 
for i <— 3,4, . . . ,m do 

for all u G 7i and r G 7i do 

Set pii^r ^ Pv.u U {u,r), where u G 7i-i is selected to minimize w[py,u U {u,r)). 
return p^^r U {r,v), where i; G 7i and r G Tm are selected to minimize w[py^r U (r, w)). 



2.1.1. First Cluster Selection 

Observe (see Algorithm [1} that the time complexity of CO grows linearly with the size of cluster 7i . 
Thus, before applying CO, we rotate the solution such that |7i| =7- This technique reduces the time 
complexity of the algorithm to 0(7173), that was widely used in the literature. 

Note that A'co [T) is a 'very large neighborhood' since it is of an exponential size and there exists 
a polynomial exploration algor ithm for it. Sometimes, neighborhoods of this class are very effective 
(|Gutin and Karapetvanl l2009d ) . 



2.1.2. First Cluster Reduction 

Since the running time of CO significantly depends on the size 7 of the smallest cluster, it is worth 
checking wh ether we can reduce its size. S ome attempts to reduce the cluster sizes in the GTSP were 
proposed bv lGutin and KarapetvanI ([2009a|). The idea was to remove a vertex r € R, where i? is a cluster. 



if for every pair of vertices v and u, Cluster{v) ^ Cluster{u) ^ R, there exists some r' G i? \ {r} such 
that w{v, r' , u) < w{v, r, u). 

In our case, the reduction can be significantly more efficient. Indeed, we do not need to consider all 
u and V. Let R = Ti- Then consider only u G Tm and v GT2. 

A straightforward reduction algorithm would take 0(s^7^) operations. We propose Algorithm[2]which 
reduces the cluster 7i in 0(3^7) time. One can try to apply this procedure to reduce every cluster but 

Algorithm 2 Reduction of a cluster in a tour. 

Require: Tour T = (Ti, T2, . . . , T„, Ti), where | Cluster{Ti)\ = 7. 
Let U = Cluster {Tm), R = Cluster (T) and V = Cluster (T2). 
for all u E U and w G V^ do 

Find the shortest distance lu,v <— Tam.reRw{u,r,v). 

Find the number Cu.v of paths {u,r,v) such that w{u,r,v) = lu,v, i-e., Cu,v ^ |{'' ■ '' ^ 
R and w{u, r, v) = lu,v}\- 
for all r G i? do 

for all u E U and v E V do 

if w{u, r, v) = lu.v and c„^„ — 1 then 
Go to the next r. 
for all u G U and v E V do 
if w{u, r, v) — lu.v then 
Update Cu.v ^ c„ .„ — 1. 
Remove r from R. 

this would likely slow down the CO algorithm. We apply this reduction only to the smallest cluster 
Ti = Cluster{Ti) as shown in Algorithm [2l Moreover, we never apply this reduction if |7^||72| > n. 
Indeed, in the best case, CO takes only 0(771) operations (consider, e.g., the case when |72i-i| — 7 for 
every i) so it is unreasonable to run the reduction if its time complexity is more than 0(771). 

Note that this reduction is valid only for a certain cluster order and, hence, the cluster 7i must be 
restored after the run of CO. 



2.1.3. Calculations Order 

The procedure of finding tlie shortest paths in a layered network can be described as follows. Assume 
that the layers of the network are 7i, Ti, • ■ • , TTn, 7^, where 77 is a copy of 7i, and the objective is to find 
all the shortest (w, u')-paths from every f G 7i to its copy v' e 7^'. Observe that removing any layer 7i, 
1 < i < ?7i, and adding edges from every u £ 7i-i to every v G 7i+i such that w(u, v) = imnn^ji w{u, r, v) 
preserves the lengths of the shortest (i',i'')-paths. After repeating this procedure m — 2 times, we get 
exactly three layers 7i, 72 and T{ such that minrgr2 'w{v,r,v') is the length of the shortest (u,i'')-path 
(we assume that the layers are renumbered after every iteration). This interpretation is exploited in 
Algorithm [3] 

Algorithm 3 Sequential CO implementation. This algorithm is equivalent to Algorithm [1] 

Require: Network layers Ti, T2, . ■ . , Tm, Tm+i-, where Tm+i = 7i. 
for i ■(— 1, 2, . . . , m — 2 do 

Set w{u, v) <— minrgr2 w{u, r, v) for every w G 7i and w G 73. 
Remove layer T2; renumber the layers accordingly. 
return mmy:^ji^reT2'^{'v^''^T'^)- 

Observe that Algorithm |3] removes the layers sequentially but this can be done in an arbitrary order. 
A generalized dynamic programming implementation of CO can be described as in Algorithm |4l Here 

Algorithm 4 A generalized dynamic programming implementation of CO. 

Require: Network layers 7i, T2, . . ■ , Tm, 7m+i, where Tm+i = 7i. 
for i ^ 1, 2, . . . , m — 2 do 

Set w{u,v) ^ min^gT-x w{u,r,v) for every u G Txi-i and v G Txi+i- 
Remove the layer Txi ; renumber the layers accordingly. 
return mint,g7-j.rer2 w^(''^J''J^)■ 
A^ is a sequence of to — 2 numbers, \<Xi<m — i + 1. It defines the algorithm's behavior: on the 
jth iteration the algorithm removes cluster Txi from the sequence by calculating the shortest paths from 
Txi-i to 7xi+i. Note that Algorithms |4] and [3] coincide when X — (2, 2, ... , 2). 

Let us count the number of times Algorithm|3]obtains an edge weight (we will call it weight operation). 
This number adequately reflects the running time of an implementation. 
In general. Algorithm |4] requires 



^eencral — ^ 



m-2 

iri||rfe| + 5^ir..||7;j|r.. 



weight operations, (2) 



where x, y and z are ordered lists and 1 < /c < to,, all derived from X (we had to introduce these indices 
because of renumbering performed on every iteration of the algorithm) . Note that in Q , the expression 
in brackets is the number of 3- vertex paths considered by Algorithm 21 and the factor 2 is the number of 
weight operations per path. Without loss of generality, let Xi < yi < Zi. 

Algorithm [3] always removes the second layer in the current sequence of layers, i.e., the number of 
weight operations required for the sequential algorithm is as follows: 



f — 2 • 



m-2 

E 



iri||7;„| + V|riim+i||7i+2| 



(3) 



Consider the following example. Let to, be even, |72i| = z > 1 and |72i-i| = 1 for every i = 
1, 2, ... , to/2. According to ([3]), the sequential algorithm performs 2(to— 1)z weight operations. Consider 
the general implementation Algorithm U] with X = (2, 3, . . . , ^i 2, 2, . . . , 2). It starts from removing all 
the layers of size z and then acts as the sequential algorithm. Observe that it requires only mz + to — 2 
weight operations. Hence, the asimptotic ratio is: 

2(?n— l)z TO, — 1 
lini lini = lim 2 • = 2 . 

T7i— >oo 2— >oo JTIZ + TO, — 2 m— >oo 777, 



Note that the weight operations ratio between the sequential calculation and the improved one can be 
significant in practice. Even for the modest values m = 7 and 2 = 7 in this example the ratio is 1.5. 

The natural question that arises is how much it is possible to speed up the sequential algorithm by 
changing the calculation order. 

Theorem 1. Let the first layer in a layered network be the smallest one. Then the sequential imple- 
mentation of CO (see Algorithm 0) is at most 2 times slower than the optimal dynamic programming 
algorithm (see Algorithm^, and this bound is asymptotically sharp. 

Proof. Let 7i, 72, . . . , Tm, Tm+i — Ti he the layers of the network. Let 2 < k < m (see ([2])). For every 
j = 1,2, ... ,m, equation ([2]) contains a term IT^iHT^JlT^J such that cither Xi = j and yt = j + 1 or 
yi = j and Zi = j + 1. Indeed, it is impossible to calculate the shortest paths in a layered network without 
consideration of weights between every pair of consequent layers. Note that |7ii||7^J|7^J > 7I7j|I7j+iI 
if \Txi\\Tyi\\Tzi\ contains |7j||75+i|. Observe also that a term |7iJ|7^i||7^J niay contain both |75l|75+i 
and |7j+i||7^+2|- Based on this, we can provide the following lower bound: 

m — 2 m 



Observe that 



m 771—1 ^ 

'I]l'7;i|7^+i| = |ri||7;„||7;„+i| + ^ \ri\\%\\%+i\ > -iscq. 



i=l 



xlence, tgcncral ^ 2 scq- 
If fc = 2, weights between 7i and T2 are considered in the last line of Algorithm |4] and, hence, ^ 
must be replaced with 



m — 2 771 

i=l 4=2 



that does not change the outcome. 
li k ~ m, (21) must be replaced with 



777 — 2 777—1 

i=l i=l 



In this case 



^general ^ 



777-2 



iri||7;77| + ^ \%MyMr.,\ 



m — 1 .. 



> 2\ri\\%n\ + 1 2^ \M%+i\ > -iscq. 



The example before the theorem implies that the bound tgcnorai > s^'^cq ^^ asymptotically sharp. D 

It is not hard to see that the number of distinct dynamic programming implementations of CO is 
exponential in m, and it is usually impractical to search for the optimal calculations order. Instead, we 
propose a simple heuristic that improves the sequential algorithm. On every iteration, out heuristic looks 
one step ahead; if the condition 

|ri||r2||r3i + |ri||r3iir4| > 1X211X311X4! + 1X111X211X4! , (5) 

is satisfied for the current numbering of clusters, then it removes cluster X before removing X; otherwise 
it removes X and proceeds to the next iteration. For details see Algorithm [5] 

Note that Algorithms [3l|4] and [5] find the shortest cycle weight but not the shortest cycle itself. It will 
be shown below that it is usually required to find only the weight of the shortest cycle. In the rare cases 
that we need the shortest cycle itself, we use the basic sequential implementation (Algorithm [T|) . 



Algorithm 5 Cluster Optimization with an improved order of calculations. 

Require: Tour T = (Ti, T2, . . . , T„, Ti), where | Cluster{Ti)\ = 7. 
Let Ti = Cluster [Ti) for every i. 
for i ^ 2, 3, . . . , m — 1 do 

if i < TO- 1 and |Ti||7^||7I+i| + |Ti||7^+i||7I+2| > \ll\\Ti+i\\%+2\ + \Ti\\%\\%+2\ then 
Calculate the shortest paths from % to 7i+2- 
Calculate the shortest paths from 7i to 7i+2- 
Set the weights between 7i and 7i+2 to the calculated values. 
Set i ^ i + 1. 
else 

Calculate the shortest paths from 7i to Ti+i- 
Set the weights between 7i and 7i+i to the calculated values. 
return mint,g7-j.rer„ '"^('f'l ^i '*')• 
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7 = 1. 




5.6 
27.4 




Average 


2.1 


16.1 


6.9 


6.3 


4.5 


4.3 



(b) Instances with 7 > 1. 

Table 1: Experimental results for the different variations of CO. Note that here and below we show 
the average values in every table. These should not be considered as the main performance indicators 
because sometimes they are too much biased to the results obtained for large instances. However, large 
instances are of most interest and, thus, averages, being properly understood, can be helpful in analysing 
experimental results. 



2.2. Computational Experiments 

In order to check the efficiency of the proposed improvements, we provide the results of computational 
experiments in Tables [Tal and [Tbl Table [Tal includes only the instances with 7 = 1 (to save space, every 
fifth instance is taken) while Table [Tbl includes all the instances with 7 > 1. 

All the implementations COi, CO2, CO3 and CO4 apply the first improvement, i.e., rotate the tour 
such that \Cluster{Ti)\ — 7. In addition, CO2 and CO4 optimize the calculations order according to 
Algorithm [51 and CO3 and CO4 try to reduce the size of the smallest cluster according to Algorithm [21 

In spite of the fact that all the instances in the test bed have small 7 (the largest 7 in the test bed 
is 3), the experiments clearly show that the cluster reduction technique is very efficient (see the results 
for CO3 and CO4). It was able to significantly improve the running times for almost every instance in 
Table [Tbl (these implementations are obviously not included in Table [Taj) . 

The optimized calculations order is also beneficial, but not so much. It is more efhcient when 7 > 1 
(moreover, if 7 = 1, it often slows down the algorithm). Indeed, it is easy to show that if 7 = 1 then, in 
order to meet ([S]), either 72 or 74 should be of size 1. Hence, if 7 = 1, this improvement can be applied 
quite rarely and only in some relatively easy cases. 



We conclude that the proposed refinements are usually insignificant if 7 = 1 but they are very efficient 
if 7 > 1. In what follows, we use a hybrid implementation of CO, see Algorithm [51 

Algorithm 6 Hybrid implementation of CO. 

Require: Tour T = (Ti, T2, . . . , T„, Ti). 

Rotate the tour T such that \Cluster{Ti)\ — 7. 
if 7 > 1 then 

Reduce cluster 7i (see Algorithm [2]) . 
if 7 = 1 then 

Apply sequential implementation of CO (see Algorithm |3|) . 
else 

Apply CO with improved calculations order (see Algorithm [S]) . 
return T. 



3. TSP-inspired Neighborhoods 

Since the GTSP is an extension of the TSP, it is natural to use TSP neighborhood adaptations for 
the GTSP. In this section we discuss different ways to adapt a TSP neighborhood for the GTSP. These 
approaches ar e later applied to the most effic ient TSP neighborhoods. Note that some of these ideas are 



presented in ([Karapetvan and Gutinl . l2011a[) but in this study they are generalized, further developed 
and discussed in detail. 

It is worth saying that the adaptation of a T SP neighborhood for the G TSP is not as straightforward 



as it may seem to be. Among other approaches. iRenaud and Boctoij (|l998() propose decomposing GTSP 



into two problems: solving the TSP instance induced by the given tour to find the cluster order and 
then applying CO algorithm to it (see Section [2]). We will show now that this method is generally poor 
with regard to solution quality. Let Mnd(r) be a set of tours which can be obtained from the tour T by 
reordering vertices in T. Observe that one has to solve a TSP instance induced by T to find the best 
tour in Mnd(T). Let Nco{T) be the neighborhood of the CO local search (see Section[2|). 

The following theorem shows that decomposing the GTSP into two problems (iteratively search 
in Nj^^jT) and then search i n Nco{T)) does not guarantee any solution quality. For a proof, see 



Karapetvan and Gutinl (|2011al ). 



Theorem 2. The best tour among Nco{T) U Nind{T) can he a longest GTSP tour different from a 
shortest one. 

3.1. TSP Neighborhoods 

In order to continue this discussion, let us briefly list the most well-known TSP neighborhoods. Here 
we assume that m is the number of vertices in the TSP instance. 

fc-opt is the most general TSP neighborhood. It includes all the tours that are different from the given 
one in at most k edges. Obviously any tour can be obtained from a given one by an rn-opt move. 

Insertion neighborhood includes all the tours that can be obtained from the given one by removing a 
vertex and inserting it at some other position. It can be viewed as a special case of 3-opt. 

Swap (also knov^rn as Exchange) neighborhood includes all the tours that can be obtained from the 
given one by swapping two vertices. It can be viewed as a special case of 4-opt. 

Lin-Kernighan is a sophisticated heuristic exploring some areas of fc-opt without fixing the value of k. 
It does not have any certain neighborhood and, thus, is not considered in this paper. 



F or more information o n these and some other TSP local searches, see, e.g., (| Johnson and McGeoch . 



2OO2I : iJohnson et all . l2002[ ) 



Algorithm 7 Typical local search with neighborhood N{T). 



Require: Solution T. 
for all T' e N{T) do 

if w(T') < w{T) then 
T ^T'. 

Rerun the for loop again. 
return T. 



3.2. Adaptation of TSP local search for GTSP 

A typical local search with a neighborhood N{T) is shown in Algorithm[7l Let Ni{T) C Mnd(r) be a 
neighborhood of some TSP local search LSi{T). Let N2{T) C Nco{T) be a neighborhood of the Cluster 
Optimization class and LS2(T) an exploration algorithm for it. Then one can think of the following two 
ways to combine these local searches in one GTSP local search: 

(i) Enumerate ah candidates T' e Ni{T). For every candidate T' find T" <— LS2{T') to optimize it in 
N2{T'). If w{T") < w{T), replace T with T" and continue. 

(ii) Enumerate aU candidates T' e N2iT). For every candidate T' find T" <— LSi{T') to optimize it in 
Ni{T'). If w{T") < w{T), replace T with T" and continue. 

Observe that the neighborhood iVi (T) is normally much harder to explore than the cluster optimiza- 
tion neighborhood 7V2(r). Consider, e.g., Ai(T) = Aind(r) and N2{T) = Nco{T). Then both options 
yield an optimal GTSP solution but Option ^ requires only 0(rijs ■ {m— 1)!) operations while Option (juj 
requires 0{s"^ ■ (to — 1)!) operations. 

Moreover, many practical applications of the GTSP have some localization of clusters, i.e., typically, 
\w{x,yi) — w{x,y2)\ ^ w{x,yi) if Cluster{yi) = Cluster{y2) ^ Cluster{x). Hence, the dependency of 
the N2 (T) landscape on the cluster order is higher than the dependency of the iVi (T) landscape on the 
vertex selection and, thus. Option (IH) is p referable. 



Option jlil was used bv IHu and Raid l (2008). Note that using N2{T) = Nco{T) would lead to a 



non-polynomial algorithm; the cluster optimization neighborhood N2{T) they use includes only the tours 
which differ from T in exactly one vertex. For every T' e N2{T), the Chained Lin-Kernighan heuristic is 
applied. This results in n runs of Chained Lin-Kernighan which makes the algorithm unreasonably slow 
while the vertex selection is given a very little freedom. 

Option (jl| may be improved as shown in Algorithm [8l Here QuicklmproveiT) and SlowImproveiT) 

Algorithm 8 Improved adaptation of a TSP neighborhood for the GTSP according to Option (jT]). 

Require: Tour T. 

for all T' e Ni{T) do 
T' ■(— QuickImprove{T') . 
if w{T') < w{T) then 
T ^ SlowImproveiT'). 
Rerun the for loop again. 
return T. 

are some tour improvement heuristics of the Cluster Optimization class. Formally, these heuristics should 
meet the following requirements: 

• QuicklmproveiT) , SlowImprove{T) G Nco{T) for any tour T; 

• w{QuickImprove(T)) < w{T) and w{SlowImprove{T)) < w{T) for any tour T. 

Quicklmprove is applied to every candidate T' before its evaluation. Slowlmprove is only applied to 
successful candidates in order to further improve them. One can think of the following implementations 
of Quicklmprove and Slowlmprove: 
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Trivial I{T) which leaves the solution without any change: I{T) — T. 



• Local cluster optimization L{T) — L{T,I), see Section [51 It updates vertices only within clus- 
ters Ci, i <E I, affected by the latest solution change. E.g., if a tour (xi,X2,x^,Xi,xi) was 
changed to {xi,X'i,X2,XA,xi), we can use L(r, {2,3}) which will yield the best solution among 
{xi^x''^,x'2,Xi^xi), where x'2 G Cluster{x2) and x'^ G Cluster[xz)- The time complexity of L[T) is 
0(|J|s) or 0(|7|s^), depending on the affected clusters. 

• Global cluster optimization CO{T) which applies the CO algorithm to the given solution. The time 
complexity of CO is 0(7175). 

There are five meaningful combinations of Quicklmprove and Slowlmprove: 

Basic Quicklmprove (T) = I{T) and Slowlmprove {T) = I{T). This actually yields the original TSP 
local search applied to the TSP instance induced by the GTSP tour T . 

Basic with CO Quicklmprove (T) = I{T) and Slowlmprove (T) — CO{T), i.e., the algorithm explores 
the original TSP neighborhood but every time an improvement T' is found, it is optimized in 
Nco{T'). One can also consider SlowImprove{T) = L{T), but such adaptation has no practical 
interest. Indeed, Slowlmprove is used quite rarely and so its influence on the total running time 
is negligible. At the same time, CO{T) is much more powerful than L{T) with respect to solution 
quality. 

Local QuickImprove{T) — L{T) and SlowImprove{T) — I(T), i.e., every candidate T' G Ni(T) is 
improved locally before it is compared to the original solution. 

Local with CO QuickImprove{T) =^ L{T) and SlowIm,prove{T) = CO{T), which is the same as Local 
but in addition it optimizes every improvement T' globally in Nco{T'). 

Global Quicklmprove (T) = CO{T) and Slowlmprove (T) = I{T), i.e., every candidate T' G Ni{T) is 
optimized globally in Nco{T') before it is compared to the original solution T. 

For a TSP local search LS we use LSb, LS'^, LSj^, LS^ and LSq to denote the Basic, Basic with 
CO, Local, Local with CO and Global adaptations of LS, respectively. 

Some of these adaptations were applied in the literature. For example, the heuristics G2 and 
G3 (jRenaud and Boctoii 119981 ) are actually Global adaptations of 2-opt and 3-opt TSP heuristics, re- 
specti vely. An enhanced implementation of the Global 2-opt adaptation is proposed by IHu and Raidl 
( 20081 ): asymptotically, it is faster than the naiv e implementation by f a ctor 3. Local adaptatio n s of 2 - 
opt and some other ne i ghborhoods were used by Fischetti et al.l (119971): Gutin and KarapetvanI (|2010l ): 
Silberholz and Golden (120071): Snyder and Daskin (2006): Tasgetiren et al. ( 2007). Some Basic a dapta - 



tions were used by iBontoux et al.l (|201Cl[ ): ICutin and KarapetvanI (|2010l ): ISilberholz and GoldenI (|2007l ) 
Snvder and DaskinI (|20o'6ll 



3.3. Global Adaptation 

The most powerful adaptation of a TSP local search for the GTSP is the Global adaptation. It applies 
CO to every candidate tour before it is evaluated. In other words, if Ni{T) C Nind{T) is the original TSP 
neighborhood, then the adapted neighborhood N{T) is as follows: 

N{T)= \J NcoiT'). 

T'eNi(T) 

Observe that the Global adaptation turns a polynomial size TSP neighborhood into a very large neighbor- 
hood, i.e., into a neighborhood of the exponential size that can be explored in polynomial time. Indeed, 
^00(^1) n NcoiT2) = if the tours Ti and T2 have different cluster order. Hence, the size of N{T) is 
exactly 



\N{T)\ = \N,{T)\-l[\C.\eO{\N,{T)\-sn: 



i=l 
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while it takes only 0(| A'^i (T) | -^sn) oper ations to explore it. T his approach was applied bv lRenaud and Boctoi 
(|l998l ) and it was slightly improved bv lHu and Raidll (|2008l) . 



We propose a new technique that can further speed up the Global adaptation. In p articular, it is mj/s 



times faster than a straightforward adaptation described above. It was first applied in fjKarapetvan and Gutin . 



201181 ) for the Lin-Kernighan heuristic. In this paper we generalize this approach and also provide some 
additional improvements. 

The main idea of our technique is to generate candidates T' G Ni (T) in a certain order such that 
previously calculated shortest paths could be reused. Observe that any TSP local search is a special case 
of fc-opt. Indeed, any transformation of a TSP tour may be represented as a fc-opt move, subject to a 
sufficiently large value of k. 

Let k-opt(T, a, /3) be a tour obtained from T by removing edges a and adding edges /3, where a and 
/? are edge sets, |a| — \/3\ — k. We need to group all the candidates T' e Ni{T) into g groups, each group 
meeting the following requirements: 

• Let T\ T^, . . . , r' be a group of candidates and T* = A;-opt(r, a% /3*). Without loss of generality, 
we may assume that k = const for the whole group of candidates. 

• Let a = r\i=i Q^* ^^^^ let a"- — a* \ a. Similarly, (3 = 0^=1 P^- 

• Let Q = (T \ a) U /3, i.e., Q is a set of paths and/or cycles produced from T by removing the edges 
a and adding the edges /3. 

• Removing the edges a" from Q yields a number of paths, let us say PI, P2, ■ ■ ■ , Pk-\R\- Our require- 
ment for each group is that every of these paths has at least one fixed end: 

beginning (P^) — beginning (P^) for every i,j g {1, 2, . . . , Z}, or 
end{P^) = end{Pi) for every i,j S {1, 2, . . . , Z} 

for every x = 1, 2, . . . , fc — \/3\. 

• In order to achieve an mj/s times speed up, the number g of groups must be g G 0{ ' ^^ '' ), and 
the number of edges in every a* must be fixed: fc — |a| G 0(1). 

If the above requirements are satisfied, the Global adaptation may be implemented as in Algorithm[Sl 
Observe that finding the shortest paths in a series of fragments P^, Pf, ..., P- takes only 0{ns'^) 
operations: start from the fixed end of P* and calculate the shortest paths to every vertex in the required 
direction. Since the number of fragments k — \f3\ is fixed, finding the shortest paths in all fragments P^ 
i — 1, 2, . . . , fc — |/3|, j = 1,2, ... ,1, also takes 0{ns^) time. All the runs of CO take 0{ns^) operations. 
Thus, instead of 0{ninjs) operations needed for a 'naive' implementation to explore a group of 0{m) 
candidates. Algorithm [S] takes 0{ns^) time. 

Observe that this algorithm can be used for both symmetric and assymmetric GTSP. Indeed, even 
if orientation of some path in the candidate tour does not coincide with orientation of this path in the 
original tour, one can calculate the shortest paths within this fragment in the backward direction. 

3.3.1. Implementation Example 

Let us consider the 2-opt TSP neighborhood and its Global adaptation. Algorithm [10] enumer- 
ates all the candidates in iV2-opt(P)- Consider a group of candidates {T^,T^, . . . ,T'} C N2-opt{T) 
such that T' = fc-opt(T,a',^') ioi i = 1,2, . . . ,1, where a' = {{T^,T^+i), {Ty^^),Ty(^^)+l)} and /3' = 
{(r^,Tj,(,)), (r^+i,T2,(,)+i)} (see Figure [2il). We get a = {{T^,T^+i)} and /3 = 0. Hence, Q is a path 
obtained from T by removing the edge {Tx,Tx+i). Further removing the edge a''' — {(2^1/(1), Py(i)+i)} 
splits Q into two paths (Pr+i, • • • , 2^a(i)) ^^"^ (2^j/(i)+i7 • • • 7 T^). Observe that (T^+i, . . . , Ty^c^) has a fixed 
beginning, and (ry(i)+i, . . . ,Tx) has a fixed end. Observe also that the number of candidate groups is 

0(m) while the total number of TSP candidates is 9(to^), and, hence, g G 0{- — ^'°'' ). 

Algorithm 1111 explores the neighborhood N2-opt{T) for some fixed x. Compare the time complexity of 
the naive exploration of iV2-opt(T'), which is 0{ni'^njs), with our adaptation, which takes only 0{mns^) 
operations. If s/7 <C m, which is a very natural assumption, our implementation is significantly faster 
than the naive one. 
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Algorithm 9 General implementation of the Global adaptation of a TSP local search. 

Require: Tom T optimal in Nco{T), i.e., T = CO(T). 

Require: A group of candidates T^,T'^, . . . ,T' such that T' = k-opt{T, a\ (3^) for i = 1, 2, . . . , /. 
Let a ^ flLi a* and (3 ^ flLi P' ■ Let a'* ^ a* \ a for i = 1, 2, . . . , /. 

Let Q ^ (T \ a) U /3. Let Q \ a^ = {P/, P|, . . . , P^;_l^l } for i = 1, 2, . . . , L Note that the paths Pj have 
to meet the conditions above, see Section [231 
for j <- l,2,...,fc- 1^1 do 

Calculate all the shortest paths through the cluster sequences corresponding to Pj, Pj , . . . , P!-. 
for i <— 1, 2, . . . , / do 

Construct a layered network L as follows: 

• Each layer 2j — 1, j = 1, 2, . . . , /c — |/3|, corresponds to the cluster beginning (Pj); 

• Each layer 2j, j = 1, 2, . . . , fc — |/3|, corresponds to the cluster end{Pj); 

• The weights between layers 2j — 1 and 2j are equal to the shortest paths in P*; 

• The weights between layers 2j and 2j + 1 are equal to the weights between corresponding 
clusters. 

• Layer 2(fc— |/3|) + lisa copy of layer 1, and the weights between layers 2(fc— 1/3|) and 2(fc— 1/3|) + 1 
are equal to the weights between corresponding clusters. 

Find the shortest cycle C in the layered network L using the CO algorithm. 
if w{C) < w{T) then 

Update tour T according to the cycle C. 

Restart the algorithm. 
return T. 

Algorithm 10 Enumeration of all the candidates in the TSP 2-opt neighborhood. 

Require: Tour T. 

for X -^ 1,2, ... ,m — 2 do 

for 2/ •<— a; + 2, X + 3, . . . , min{r77,, x + m — 2} do 
List the candidate Turn{T,x,y) (see Section [T|). 



3.4- Global Adaptation Refinements 

Observe that the above proposed implementation of the global adaptation consists of (a) calculating 
the shortest paths through tour fragments, and (b) calculating the shortest cycles. Both parts are time 
consuming; for example, in 2-optQ, each (|aj) and (|b| takes 0{mns^) operations. In Section 13.4.11 we try 
to predict if a candidate can improve current solution without running CO. This only or almost only 
affects part ([b| . To improve (jaj) , in Sections 13.4.21 and 13.4.31 we propose an approach that dramatically 
reduces the number of shortest paths to be calculated. It also saves time on part (jbj by selecting smaller 
clusters for the layers in networks L. 

3.4-1- Lower Bound 

In the proposed adaptation, we calculate the shortest cycle on every iteration. Having a lower bound 
for the shortest cycle, one could omit some of these calculations. 

Assume that the rearranged tour T consists of k cluster sequences P^ , P^ , . . . , P'^ such that end{P^) is 
connected to beginning {P^^^) and end{P^) is connected to beginning [P^) , where beginning {P'^) {end{P^)) 
is the first (the last) cluster in P*. Let p* be the weight of the shortest path through the cluster sequence 
P'. Then the following is a lower bound for the shortest cycle in this sequence of clusters: 

k 
w{CO{T)) > Y, [p' + wnun (beginning (P^), end{P'+^))] , 

where P'^+i = P^. Recall that u'min(A, Y) is the weight of the shortest edge from cluster X to cluster Y . 
It would take too much time to calculate the shortest paths p* on every iteration. Instead, we propose 
a lower bound for p' according to Theorem |3l 
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(b) Having all the shortest paths between 71/ and Tx+i, and 
between 7^+i and Tx, one can construct a layered network 
and apply CO to it in order to find the shortest cycle in the 
whole rearranged tour. 



Figure 2: Global adaptation of the 2-opt heuristic. 



Algorithm 11 Global adaptation of 2-opt. 



Require: Tour T. 
Let % ^ Cluster (T,). 
for a; •(— 1, 2, . . . , TO — 2 do 

Calculate the shortest paths along the tour T from every vertex in Ty to every vertex in 7i-|_i and 
from every vertex in Ty+i to every vertex in Tx for every y — x + 2,x + 3, . . . , minJTO, x + m ~ 2}. 
for y ■^ X + 2,x + 3, . . . , min{77t, x + m — 2} do 
Construct a layered network L as in Figure [2bl 
Apply CO to L to get the shortest cycle C. 
if w{C) < w(T) then 
Replace T with C. 
Restart the whole algorithm. 



Theorem 3. For the shortest path from an arbitrary vertex in Ta to an arbitrary vertex inTb in a layered 
network 7i U 72 U . . . U Tm we have: 

Wrmn{Ta,Ta+l, ■■■,%)> w{Ta,Ta+l, ■ . ■ ,Tb) 

- W,naxiTa,Ta+l) - Wrnax{Tb-l,Tb) + Wmm{Ta,Ta+l) + Wmrn{Tb-l,Tb) , (6) 

where {Ti,T2, . . . ,T.m,Ti) is the shortest cycle through all the layers of the network. 

Proof. Observe that (Ta, Ta+i, . . . , Tb) is the shortest path from Ta to Tb through the layers Ta+i, Ta+2, 
. . . , 7b- 1. Indeed, if there was a shorter path, the shortest cycle (Ti, T2, . . . , T™, Ti) could be improved. 
Assume that there exists some path (T^, T^^j^, . . . , T^), T/ G 7i, shorter than the lower bound provided 
in ©: 

w{Ta,T^+l, . . . , Tft') < w{Ta,Ta+l, . . . ,Tb) 

- W,niiATa,Ta+l) " I0max(7b-1 , Tf,) + tfinin (7^ , 7^+1 ) + W„iin(76-1 , 7;,) . (7) 



Observe that 



w{Ta,Ta+^) - Wmax(ra,7^+l) < w(T'^,r^.+i) - Wmin(7^ , 7^+1 ) and 
w{Tl^^,Tb) - Wmax(7b-l,rh) < w{Tl_^,Tl) - Wmin(76-l,7fc) 



(8) 
(9) 



because the left-hand sides of 1^ and ^ are non-positive and the right-hand sides are non-negative. We 
have w{T^, T^+i, . . . , T,') = «;(T^, T^+i) -)-u;(T^+i, T^+2, ■ • ■ , n-i)+MTLi,n)- By substitution of lower 
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bound for w(r^,T^+i) and w{Tl^_-^,Tl) obtained from ^ and ©, respectively, to ^ we get: 

w{Ta,T'^+l) ~Wn,e,K{Ta,Ta+l) + W,^in{Ta,Ta+l) + w(r^+i , T^+2 ' • • • ) ^6-l) 

+ w{Tl_^,Tb) - Wmax(76-l,rfc) + Wmin(76-1 , 7fc) 
< wiTa,Ta+l,.-.,Ti,) ~ W^^-yr{Ta,Ta+l) ^ W^ax (7^-1 , Tfc) + Wmin (7^, 7^+1 ) + W^in{Tb -!■,%) ■ 

From that we have 

w{Ta,T'^+l) + w(T^+i, r^+2. ■ • • : T^-l) + w{Tb^^,Tb) < w{Ta,Ta+l, . . . , Tfc) Or 

Hence, the path (T^, T^+i, ra+2j • ■ • i Tl_-^,Tb) is shorter than (Ta, Ta+i, . . . , T(,), a contradiction. D 

Observe that, having precalculated Wmin{X, Y) for every pair of clusters X and Y and Wmax(a;, ^) and 
'Wmax(^j 2:) for cvcry pair of vertex x and cluster F, it takes only 0(1) time to compute the lower bound 
^. A drawback of this approach is that it needs the shortest cycle {Ti,T2, . . . , Tm, Ti) corresponding to 
the current solution, i.e., every time an improvement is found, one has to use CO to find the tour itself 
(recall that we normally need only the cluster order and the weight of current solution). These additional 
calls of CO, however, do not take much time in practice. 

In our experiments the use of the lower bound speeds up the 2-opt Global adaptation in about three 
times, on average. The lower bound works better for large instances because the lower bounds for large 
instances have better relative precision. Indeed, the number of edges calculated imprecisely is always 
fixed while the total number of edges included in the lower bound increases with the increase of the 
instance size. 

In certain cases the lower bound ([6]) can be improved using Theorem [4l 

Theorem 4. For the shortest path from an arbitrary vertex in 7i to an arbitrary vertex in Tm in a 
layered network 71 U 72 U . . . U Tm we have: 

Wmm(7l,72, ■ . • ,Tm) > w{Ti,T2, . . . ,Tm,Ti) - Wmax{Tm,Tl) , 

where (Ti,T2, . . . ,Tm,Ti) is the shortest cycle through all the layers of the network. 
Proof. Assume that there exists a path (r{, Tj, . . . , T^), T"/ € T, such that 

w{Tl,T2, . . . ,T^) < w{Ti,T2,...,Tm,Ti) - Wmax(7^,7'l) . 

Close up this path with the edge (T^j, T[). Observe that the weight of the obtained cycle is 

w{T[,T!2, . . . , T;,, T[) < w{T,,T2, . . . Tm,T{) + w{T:^,T[) - w^UTm^Ti) . (10) 

Thus, w(T{,T2, . . . , T^, T{) < w(Ti,T2, . . . , T^, Ti), a contradiction. D 

3.4-2. Supporting Cluster 

Observe that, in general, skipping some of the shortest cycles calculations (see Section 13.4.11) does 
not decrease the time spent to find the shortest paths. Indeed, even if the shortest paths between some 
clusters T and 7} are not required due to the lower bound, these paths are still needed, e.g., to find the 
shortest paths between T and Tj+i- 

We propose an approach that significantly reduces the number of shortest paths required for the 
global adaptation. It also guarantees that the layered network L constructed on every iteration will 
always contain the smallest cluster (recall that the CO performance significantly depends on the size of 
the smallest cluster in L). This is achieved at the cost of a larger number of layers in L. 

Consider the 2-optQ implementation discussed in Section [3.3.1l Observe that the fragment {Ty+i, Ty+2, ■ ■ ■ , %:) 
always contains cluster 7i. Let us calculate all the shortest paths in fragments (7i, 72, • ■ • ,71) and 
(7i+i, 7i+2, ■ ■ • , Tmi 7i) for every i = 2, 3, . . . , to — 1. Now, by adding 7i as an additional layer to the 
layered network L, we avoid calculations of the shortest paths from Ty+i to Tx, see Figure [3l We call 7i 
a supporting cluster. 
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7" , 1 ^^^^^^^^ Shortest v q- ^^^^ Shortest — ^ 7- 

paths paths 

w{u,v) w{u,v) 

e Tx+i and w e Ty+i u eTx and v e Ty 



T , , — Shortest - t 

paths 

Figure 3: 7i is a supporting cluster in the layered network L. instead of calculating the shortest paths 
from every Ty+i to every Tx, i.e., for 0{m?) combinations of x and y + 1, we only need the shortest paths 
from every Ty+i to 7i, i.e., for 0(m) values of y + 1, and from 7i to every Tx, i.e., for 0{m) values of x. 

Let us find out how a supporting cluster influences the algorithm's performance. Observe that adding 
an extra layer to L requires Oimns^) extra operation to calculate the shortest cycle. However, adding an 
extra layer may also save some operations. Since we are allowed to rotate the tour, let 7i be the smallest 
cluster, i.e., |7i | =7. Then a more accurate estimation shows that the implementation of 2-optQ proposed 
in Algorithm [TT] spends 0{mn{2s^ + s)) operations on all the CO runs, and with a supporting cluster it 
would take 0{mn'y(3s + 1)) operations on it. Hence, if 7/s < 2/3, which is very typical, introducing the 
supporting cluster speeds up the algorithm. 

Observe that supporting cluster can be used only if a group of fragments shares some cluster, preferably 
of a small size. Next we propose an improvement of this technique that gives more flexibility and improves 
the time complexity of the algorithm. 

3.4-3. Multiple Supporting Clusters 

Let us consider the problem of finding the shortest paths along a sequence of clusters (7i, 72, . . . , 7^), 
i.e., finding the shortest path from every u £ 77 to every v € Tj through (7i+i, 71+2, . . . , Tj-i) for every 
1 < i < j < m. Using the dynamic programming approach straightforwardly, one can solve the problem 
in 0{ns^m) time. We propose an algorithm that, by introducing several supporting clusters, solves it in 
0{ns^ log2 m) operations such that every (m, w)-path contains at most one supporting cluster. 

For TO = 2, no calculations are required because the shortest (u,w)-path, u € Ti and v e 72, is 
iu,v). For m > 2, let us introduce a supporting cluster 77n/2 a-nd calculate all the shortest paths in 
{Ti, Ti+i,. . . , 7^/2) and {Tm / 2, Trri/ 2+1 1 ■■■,Tj) for every i < m/2 and every j > m/2. This takes Girls'^) 
operations. Using the same technique, find the shortest paths in the subsequences (7i, 72, ... , 7m/2-i) and 
(7^/2+1) 77n,/2+2, . . . , %n)- Usiug rccursiou, we can solve the whole problem in 0(nlog2 ms'^) operations. 
Now, in order to obtain the shortest {u, w)-paths, where u G li, v G Tj and 1 < i < j < m, do the 
following. If either i = m/2 or j = m/2, corresponding shortest paths are already calculated. li i < m/2 
and j > m/2, take the shortest paths from 71 to 7^/2 and from 7^/2 to Tj and use Tm/2 as a supporting 
cluster. If j < m/2 01 i > m/2, refer to the corresponding subproblem. 

Note that splitting the sequence of clusters into two parts is optimal. For example, splitting it 
into three parts requires 0(|mns^) operations to calculate the shortest paths for these two supporting 
clusters, i.e., the recursive procedure takes 0(|log3 ms^) operations. Note that 5/31og3m > log2 m for 
every m> \. 

Selecting 7^/2 as a supporting cluster is the optimal choice when |7i| — s for every i. In practice, it 
is often better to select some other cluster % such that t « m/2 if \Tt\ < \%n/2\- Indeed, the size of the 
supporting cluster is important during both calculating the shortest paths and running CO. Finding the 
optimal t, however, is hard. We use the following simple heuristic to find a good value of t. We select 
the supporting cluster Tt such that 

|7t| = min \Ti\ and \t — m/2\ is minimized. (11) 

|i — rn/2| <rn/6 

Since the positions of the supporting clusters are variable, there has to be a data structure to store 
them, and an algorithm is required to find the necessary supporting cluster when seeking for the shortest 
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path between 7i and Tj for some 1 < i < j < m. For this purpose we build a binary tree of supporting 
cluster positions. The root of this tree is the index t of the supporting cluster selected for the sequence 
(7i, 72, ■ • ■ , 7^)- The root has two children corresponding to the supporting clusters selected in the 
sequences (7l, 7i, . . . , 7t) and {%, Tt+i, . . . , T,n), respectively, etc. 

We do not calculate all the shortest paths to and from the supporting clusters in advance but use the 
dynamic programming approach. This saves significant time if some local search move is accepted. 

Note that it takes 0(log2 m) operations to find the necessary supporting cluster. However, we can 
usually do this search in 0(1) operations by reusing the result of the previous search, see Algorithm \T% 
In this algorithm, we exploit the fact that two supporting clusters can never have the same position. 

Algorithm 12 Search for the supporting cluster for 2-optQ. 

Require: Fixed position x (see Algorithm [TO)) . 

Require: The supporting cluster tree defined by root, left{i) and right (i). 
Let fc ^ 0. 

Initialize current supporting cluster position t ■<— root. 
while t<xoit>x + 2 do 
if t > a; + 2 then 

Set fc ■<— fc + 1 and save pk <— t. 
t^ left{t). 
else 

t <r- right (t). 

for y <— X + 2, X + 3, . . . , min{?7i, x + m — 2} do 
if fc > 1 and y = Pk-i then 
fc^fc-1. 

Use the distances from Ty to Tx+i- 
Update current supporting cluster t -(^ pk- 
else 

Use the distances from Ty to Tt and from Tt to Tx+i with supporting cluster 7f. 

Thus, the whole supporting cluster tree can be stored in an array of size m, and a supporting cluster can 
be located by its position. 

With all the improvements, 2-optQ takes only 0{jsn + ns^ log2 m) operations on shortest paths 
calculation and 0{'ysmn) operations on running CO on every iteration. Recall that the original imple- 
mentation of 2-optQ takes 0{s^mn) operations to proceed. Hence, the time complexity of the refined 

2-optQ implementation is 0(sn(s log2 m + ^ra)) which is O ( - ) times faster than 0{s^mn). 

In the discussion above, we assumed exploration of a full neighborhood and, hence, calculated all the 
needed shortest paths along (7i, 72, ■ • ■ , Tm)- However, in practice, we do not normally explore the whole 
neighborhood but rearrange the tour as soon as we find an improvement. Hence, heavy preprocessing of 
a tour is usually unacceptable. This means that we should calculate as few shortest paths as necessary 
for every particular candidate and when an improvement is accepted we should reuse the precalculated 
shortest paths as many times as possible. 

We propose the following implementation. A matrix Su.v is used to store the shortest distances along 
the given fragment, where u and v are the origin and the destination vertices, respectively. There are m — 2 
possible supporting clusters; for every possible supporting cluster 71 we store positions left{i), right{i), 
leftmost{i) and rightmost{i). Positions left{i) and right{i) point to the child supporting clusters of 71; 
leftmost (i) is the position of the leftmost cluster from which the shortest distance to Ti are calculated 
and valid; rightmost{i) is the position of the rightmost cluster to which the shortest distances from % 
are calculated and valid. 

First, we initialize leftmost{i) ■^ i, rightmost{i) <— i, left{i) <— —1 and right{i) <— —1 for every i 
and select the root position root according to ([Tl]) . The values left{i) or right (i) are then calculated on 
demand according to the same procedure. 
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In Algorithm I12[ prior to using the shortest distances from cluster Tj to supporting cluster 7i, we 
check if leftmost{i) < j. If not, we update the shortest distances S and leftmost{i) accordingly. Similarly, 
we use the value rightmost (i) when we need the shortest distances from % to Tj. 

When a tour fragment (%, 7i+i, . . . , Ty) is modified, we update all the information for every possible 
supporting cluster. In particular, in the 2-optQ implementation, if x < i < y, we reset all the corre- 
sponding information: leftmost{i) ^ i and rightmost{i) <— i. Otherwise, if leftmost{i) < y, we update 
leftmost{i) <— y + 1 and if rightmost(i) > x, we update rightmost{i) ■<— a; — 1. We also reset all the values 
left = right = — 1. Finally, we choose the root position root according to the procedure above. Note that, 
although we destroy the supporting cluster tree every time the tour is updated, it is likely that the new 
tree will reuse some of the old supporting clusters with all accumulated data. 

3.5. k-opt 

k-opt neighborhood is widely used for the TSP and some other combinatorial optimization problems, 

see, e .g., (jFischetti et al.l . ll997tlKarapetvan and Gutinl . l2011bt Gutin and KarapetvanlbOlOtlSnvder and Daskin 



2006 ). It was shown to be very efficient for the TSP ( Helsgaunl . l2009f) . In general, Nk-optiT) contains 
all the solutions that can be obtained from T by selecting k elements in T and then replacing them with 
k new elements such that the feasibility of the solution is preserved. In the TSP and the GTSP, fc-opt 
means replacing k existing edges in the solution with k new edges. 

The time complexity of fc-op t increases exponentially w ith the growth of k. I n practice , only 2-opt 
and 3-opt are used for the TSP (|Helsgaunl uOOOl: iLinl . Il965l) with rare exceptions (JHelsgaunl . 120091 ) ■ We 
do not consider fc-opt for fc > 3. 

3.6. 2-opt 

For fc = 2 and for a fixed pair of edges (T^jT^+i), {Ty,Ty+i) there are only two options for every 
2-opt move, i.e., to replace these edges either with (T^jTy) and (Tj+i,Ty+i) or with {Ty+i,Tx+i) and 
{Ty,Tx). However, for the symmetric case both options are identical and it takes only 0(1) operations 
to evaluate a 2-opt move, see ([1]). Hence, it takes 0{m^) operations to explore the whole neighborhood 
N2-opt{T) in the symmetric case. 

We consider two algorithms to explore the 2-opt neighborhood, namely '.simple' and 'advanced'. The 
'simple' one tries all feasible pairs of x and y with y > x, see Algorithm llSI Note that after an improvement 

Algorithm 13 Basic 2-opt algorithm, 'simple' implementation (symmetric case). 

Require: Tour T = (Ti, T2, . . . , T,„, Ti). 

Initialize b{Ti) <— true for every i — 1,2,...,to. 
repeat 

Initialize optimal <— true. 

Initialize b'{Ti) <— false for every i = 1,2, . . . ,m. 

for X <~ 1,2, ... ,m — 2 do 

for y <r~ X + 2,x + 3, . . . , min{m, x + rn — 2} do 
if b{Tx) = false and b{Ty) = false then 

Go to the next y. 
A <- w{Tx,Ty) +w{Tx+i,Ty+i) -w{T^,Tx+i) ~ w{Ty,Ty+i). 
if A < then 

Replace edges {Tx,Tx+i) and (rj^,rj^+i) in T with edges {Tx,Ty) and {T^+i,Ty+i). 
'Invalidate' vertices: b'{Ti) ^ true for every i = x,x + 1, . . . ,y. 
Set optimal <— false. 
Continue to the next x. 
Swap b and b' . 
until optimal — true 

is applied, it is not necessary to explore the whole neighborhood again. We use an efficient approach to 
avoid such repetitions. In particular, the algorithm stores a flag b(Ti) for every vertex T^. This flag shows 
if the edge starting from Ti was changed since the last check. Observe that a move of Turn{T,x,y) is 
redundant if both edges {Tx,Tx+i) and {Ty,Ty+i) stay unchanged since the last check of Turn{T,x,y). 
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The second, 'advanced', algorithm is only suitable for symmetric problems. It considers all the 
values X e {1,2, . . . ,771} and for every x it takes all feasible y such that w{Tx,Ty) < w(Tx,Tx+i) or 
w{Tx+i,Ty^i) < w{Tx,Tx+i). Indeed, if a pair of edges was not considered at all (neither when x > y 
nor when x < y), then both w(T^,, Ty) > w(r^,Tx+i) and w(Tx.+] .T„+^) > w{Ty,Ty+i) which cannot be 



an improving move. For details see (jJohnson and McGeochl . l2002fl 



An efficient implementation of the 'advanced' algorithm requires some precalculation. Let l{v) be a 
list of all vertices v' ^ v ordered such that w{v,l{v)i) < 'w{v,l{v)j) for every i < j. For a fixed x, try 
Ty <r- l(Tx)i for every i = 1, 2, . . . until wlT^, Ty) > w{Tx,Tx+i). Similarly, try Ty+i <r- l{Tx+i)i for every 
i = 1,2, .. . until w{Tx+iTTy^i) > 'w{Tx,Tx+i). This will exhaust all necessary values of y. Note that 
for the GTSP, one has cither to precalculate lists l{v) every time before the 2-opt run or, instead, keep 
clusters in the lists l{v) such that Wi-nin{v,Hv)i) < iCmin(w, ?(w)j) for any i < j. 

For the asymmetric problem, one standalone move Turn{T,x,y) of 2-opt requires 0{m) operations. 
There are two options to reconnect the fragments and each of the options requires one of these fragments 
to be inverted. However, it is still possible to explore the whole neighborhood N2-opt{T) in 0{m?). For 
this purpose the 2-opt moves should be carried out in a certain sequence, see Algorithm [HI On every 

Algorithm 14 Basic 2-opt implementation for asymmetric problem. 

Require: Tour T = (Ti, T2, . . . , T„, Ti). 
for a; <— 1, 2, . . . , to — 2 do 
Initialize (5^0. 

for 2/ ^ a; + 2, a; + 3, . . . , min{?7i, a; -|- to, — 2} do 
Update (5 ^ (5 + w{Ty^i,Ty) — w{Ty, Ty^i). 

A ^ w{T.x,Ty) +w{T.x+i,Ty+i) - w(T^,T^+i) ~w{Ty,Ty+i) -S. 
if A < then 

The tour Turn(T,x,y) is an improvement over T. 

iteration, the variable S stores the weight difference caused by inverting the fragment (TJr+i, Tx+2, ■ ■ ■ iTy), 
i.e., 

5 = w{Tx+i, Tx+2, ...,Ty)- w{Ty,Ty-i, ..., Tx+i) . 

In order to consider the moves Turn{T, x, y) where x > y, invert the given tour Tinv = [Tm, Tm-i, ■ ■ ■ , Ti, T„i) 
and apply the procedure again. 

Observe that the time complexity of Algorithm [T3] is 0{m^). 

Our Local adaptation of 2-opt (2-optL, 20^) is based on Algorithm [T31 For every pair x and y 
it finds the shortest paths (r^_i,T^,T^, T^-i) and (T^+2,T^+i, r^+i,Ty), where T[ S Cluster{T,) for 
i ^ {x,x + l,y,y + 1}. The time complexity of 2-optL is 0{mns). 



Our Global adaptation of 2-opt (2-optQ, 2oq) exploits all the approaches proposed in Section IX3l 
Some further discussion of the 2-optQ implementation performance can be found below. 

Note that 2-optQ is naturally suitable for both symmetric and asymmetric problems. However, in 
order to explore the whole neighborhood for an asymmetric problem, the procedure has to be applied 
twice: for a tour T and then for an inversed tour Tjnv — {Tm, Tm-i, ■ • ■ , Ti, T^). 

Table [H reports the running times of two Basic and three Global adaptations of 2-opt. 2-optQ is a 
fully optimized implementation that applies all the improvements discussed in Section [231 2-optQ g^^pi^ 
is a simplified variation of the algorithm that constructs layered networks L and applies CO to them on 
every iteration but does not introduce any supporting clusters or lower bounds. 2-optQ ^^^^^ is a naive 
implementation of 2-optQ that applies CO to every candidate T' G N2-opt{T). 

One can see that 2-opt3 ^^^ (the 'advanced' implementation, see above) is usually inefficient for the 
GTSP. Observe that the time required to generate lists l{v) is 0(?Ti^logTO) while it takes only 0{m^) 
operations to explore the whole neighborhood N2-opt{T) with the 'simple' algorithm. To speed up the 
precalculation part, we tried to include in l{v) only the closest to v vertices but with no success. We 



19 



Table 2: Comparison of different 2-opt implementations. The reported values are running times, in ms. 





Basic 






Global 




Instance 


2°B 2ob 


adv. 


2°G 


2°G simple 


2°G naive 


10att48 


0.5 


0.4 


0.3 


0.5 


0.1 


12brazil58 


0.0 


0.2 


0.1 


0.5 


0.4 


20rat99 


0.0 


0.1 


0.3 


1.6 


0.9 


20krocl00 


0.0 


0.1 


0.2 


1.1 


0.8 


24grl20 


0.0 


0.1 


0.3 


3.3 


1.1 


28grl37 


0.0 


0.4 


0.5 


4.4 


3.7 


31prl52 


0.0 


0.2 


0.2 


3.7 


3.3 


40dl98 


0.1 


0.5 


1.3 


17.9 


20.1 


45tsp225 


0.1 


0.3 


1.3 


13.5 


20.6 


56a280 


0.1 


0.5 


2.2 


24.2 


37.1 


87gr431 


0.1 


1.1 


2.6 


56.9 


187.3 


107att532 


0.2 


1.7 


4.2 


85.4 


296.5 


131p654 


0.3 


2.6 


4.9 


171.8 


842.6 


200dsjl000 


0.9 


6.8 


28.0 


780.4 


6942.4 


Average 


0.2 


1.1 


3.3 


83.2 


596.9 



assume that 2-opt3 ^^ may be useful as a part of a powerful metaheuristic that needs to run 2-opt many 
times for one instance. 

As regards the Global implementations, it follows from Table [2] that, on average, 2-optQ is more 
than 10 times faster than 2-optQ g^^pi^ and more than 100 times faster than 2-optQ naive- Note that the 
speed-up highly depend on m and is better visible for large instances. This is because 2-optQ si,„pio is 
&{m^/s) times faster than 2-optQ j^^ivc ^^'^ ^^^o because the lower bound in 2-optQ is very efhcient when 
TO si large, see Section r3.4. II 

Different adaptations of 2-opt are compared in Table [31 We measure solution error as e(T) = 

Table 3: 2-opt adaptations comparison. 







Solution error, % 






Running time, ms 




Instance 


2ob 


2og' 


2ol 


2o- 


2oq 


2ob 


2oir 


2ol 


2o- 


2oq 


10att48 


8.5 


6.3 


2.3 


2.3 


2.3 


0.52 


0.22 


0.21 


0.21 


0.28 


12brazil58 


14.0 


2.1 


4.2 


1.5 


1.1 


0.01 


0.01 


0.01 


0.02 


0.08 


20rat99 


22.1 


17.1 


16.5 


13.7 


0.8 


0.01 


0.05 


0.03 


0.04 


0.33 


20kroel00 


15.2 


1.3 


5.4 


2.7 


0.0 


0.01 


0.03 


0.03 


0.03 


0.18 


24grl20 


30.2 


16.8 


9.1 


10.3 


15.2 


0.01 


0.03 


0.06 


0.10 


0.27 


28grl37 


9.6 


1.9 


3.6 


2.7 


1.9 


0.02 


0.05 


0.05 


0.06 


0.46 


31prl52 


9.8 


4.1 


6.6 


2.4 


1.3 


0.02 


0.03 


0.04 


0.06 


0.21 


40dl98 


7.3 


8.7 


3.8 


5.0 


1.5 


0.05 


0.14 


0.14 


0.28 


1.34 


45tsp225 


20.8 


14.0 


12.0 


9.4 


6.8 


0.05 


0.15 


0.13 


0.23 


1.26 


56a280 


26.9 


13.3 


18.9 


10.8 


14.6 


0.06 


0.15 


0.19 


0.30 


2.17 


87gr431 


10.3 


4.8 


8.7 


6.9 


4.2 


0.14 


0.48 


0.37 


0.52 


2.63 


107att532 


16.8 


9.2 


16.1 


14.2 


7.9 


0.22 


0.69 


0.58 


1.02 


4.21 


131p654 


4.1 


6.9 


9.0 


7.7 


4.0 


0.33 


1.42 


0.74 


1.48 


4.88 


200dsjl000 


23.3 


12.9 


17.9 


16.1 


12.9 


0.91 


3.27 


3.11 


5.28 


28.04 


Average 


15.6 


8.5 


9.6 


7.5 


5.3 


0.17 


0.48 


0.41 


0.69 


3.31 



w{T)-w{T^, 



■ 100%, where Toptimai is the optimal solution. 



t«(Topti„a: 

The Basic adaptation 2-optg is the fastest but also the weakest one. It takes only 1 ms to proceed 
even for the largest instances, however, it is not able to change vertex selection which makes its solution 
quality noncompetitive. The 2-opt3° and 2-optL adaptations, thus, are significantly better with respect 
to solution quality. The most powerful adaptation 2-optQ is only about five times slower than the next 
powerful one 2-optL° although the neighborhood of 2-optQ is significantly larger than the one of 2-optL°. 
This shows again the efficiency of the refinements proposed in Section 13.41 

3.7. 3-opt 

After removing edges (Tr, T^+i), {Ty, Tj^+i) and (T^, Tz+i) from a tour T, depending on the symmetry 
of the problem, we get four or eight options to reconnect the tour fragments to obtain a feasible tour T' 
such that {Tx, Tx+i), {Ty, T^+i), {Tz, Tz+i) ^ T' . However, we limit ourselves to only one of these options, 
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which does not turn any of the tour fragment s. Note that all the oth er options can be replaced with 
sequences of two non-independent 2-opt moves (JRego and Gloved . 120021) such as Turn(Turn{T, x, y), x, z) 
or Turn{Turn{T,x,y),y, z). 

We implemented all the adaptations (see Section 13.21) of the 3-opt neighborhood and found out that 
the obtained algorithms are rather slow than powerful. However, it is worth noting that the Global 
adaptation for 3-opt can be implemented quite efficiently. Indeed, it takes 0{ns^ log2 m) time to find 
the shortest paths from every vertex u to every vertex v ^ Cluster[u) along the tour, see Sections 13.4.21 
and 13.4.31 Then, it takes only 0{'^sm?n) time to perform cluster optimization for all the triples x, y, 
z. Hence, the whole algorithm's time complexity is 0(sn(s logj m + jm^)) which is at most 0{m) times 
slower than 2-optQ. In addition, one can apply the lower bound for the shortest cycle (see Theorem [3|) 
which significantly sped-up the algorithm in our experiments. 



3. 8. Insertion 

The Insertion TSP neighborhood includes all the solutions which can be obtained from the given 
one by removing a vertex and inserting it into some other position. Observe that Nins(T) C N^.optiT) 
(consider 3-opt where one of the fragments consists of exactly one vertex). The size of the Insertion 
neighborhood is |Afins(r)| = fn{m — 2). 

We implemented all the adaptations (see Section [5^ for Insertion (Ins). As a quick improvement 
(Quicklmprove) for the local adaptations Ins^ and Ins™, we optinrizc the vertices within inserted cluster 
and two clusters around its old position. For a lower bound in the Global adaptation (InSg) we use the 
results of Theorem [D 

Some of these adaptat ions have already been used in t he literature. For example , Ins^ was used by 

{G-opt heuristic), 
the neighborhood 



Snvder and DaskinI (12006) (it is called Swap there) and bv lRenaud and Boctoi (Il99i 



The Move heuristic by lBontoux et al.l (20101 is InsQ. However, in ( Bontoux et al.l . l201 



is explored with a heuristic algorithm which does not guarantee that it finds a local minimum. 

In Tabled we provide experimental results for all the adapations of Ins. One can see the same 

Table 4: Ins adaptations comparison. 







Solut 


ion error 


, % 






Running time, ms 




Instance 


Insg 


Insg 


InSL 


lnSL° 


Insg 


Insjj 


Insg" 


InsL 


Ins™ 


I"5g 


10att48 


4.7 


2.4 


0.9 


0.9 


0.0 


0.50 


0.21 


0.21 


0.21 


0.31 


12brazil58 


14.0 


2.1 


14.5 


0.1 


0.0 


0.01 


0.01 


0.01 


0.01 


0.13 


20rat99 


32.0 


16.5 


13.1 


11.1 


0.0 


0.01 


0.05 


0.04 


0.05 


1.07 


20krocl00 


18.5 


7.7 


14.0 


9.3 


6.6 


0.01 


0.03 


0.03 


0.06 


0.72 
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9.4 


0.20 


0.48 
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0.87 
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tendency here as in 2-opt adaptations. Despite their quite different implementations, InSj 
very similar performance. The Basic adaptation is extremely fast but of poor solution quality, 
produces slightly better solutions in roughly twice larger times. InsQ is significantly slower than Ins™ but 
its solution quality is noticeably better, especially for the small instances. 



and Ins^ have 
Ins™ 



3.9. Swap 

The Swap TSP neighborhood A'swap(2^) contains all the solutions obtained from tour T by swapping 
two vertices in it, see Figure SI Observe that \Nsvjap{T)\ — m{m — 1). 

An important message is that Swap does not work well for near-optimal solutions. Indeed, a Swap 
move can be replaced with a sequence of two Ins or 2-opt moves. Moreover, the following theorem proves 
that a 2-opt local minimum is also a Swap local minimum for a symmetric TSP. 
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Tx-1 *- Tj. s- Tx+i 

r \ 

/ \ 

I I 

\ / 

\ ,/ 

y+i -« Jy -« Jy-l 

(a) Original tour T . (b) The tour T after swapping T^; 

and Ty. 

Figure 4: A Swap move. 

Theorem 5. Lei T be a local minimum in N2-opt{T). Then T is also a local minimum in N^wapiT) if 
the problem is sym,m,etric. 

Proof. Assume that the tour T is a local minimum in A^2-opt(r) but it is not a local minimum in 
NswapiT). Then, there exist some x and y such that w{T') < w{T), where T' is a tour obtained T by 
swapping T^ and Ty (see Figure |31): 

w{Tx^i,Ty,Tx,+i) +w{Ty^i,Tx,Ty+i) < w{Tx^i,Tx,Tx+i) + w{Ty^i,Ty,Ty+i) . (12) 

Let us consider two tours: A = Turn{T,x — l,y) and B — Turn{T,x,y — 1). (Without loss of 
generality, one may assume that x < y.) According to ([T]), 

w{A) = w{T) +w{Tx^i,Ty) +w{Tx,Ty+i) -w{Ta,^i,Tx) ~ w{Ty,Ty+i) and 

w{B) = w{T) +w{Tx,Ty^i) +w{Tx+i,Ty) -w{Tx,Tx+i) - w{Ty^i,Ty) . 

If T is a local minimum in N2-opt{T) , then both w{A) — w{T) and w{B) — w(r) are non-negative and 
their sum is also non-negative. Since we consider a symmetric problem, 

[w{A)-w{T)] + [w{B)-w{T)] = [w{T,^uTy) + w{T,,Ty+i) ~ w{T,^i,T,) - w{Ty,Ty+i)] 
+ [w{Tx,Ty^i) + w{Tx+i,Ty) -w{Tx,Tx+i) - w{Ty^i,Ty)] 

= [w{Tx^l,Ty,Tx + l) + w{Ty^l,Tx,Ty + l)] - [w{Tj;^l,Tx,Tx+l) + w{Ty^l,Ty,Ty + l)] . 

However, according to (1121) . this expression is negative and, hence, our assumption is wrong and the tour 
T is a local minimum in Nswstp{T). ^^^^^^^_^^^^^^^ ^_^ D 

Note that this result was also observed empirically bv lGutin and KarapetvanI ( 20101 ). 



Until now, we considered only the TSP Swap neighborhood. Obviously, this result can be extended 
to the Basic adaptation but it is unclear if it holds for the Local and Global adaptations. 

Theorem 6. The result of Theorem\^ does not hold for the Local or Global adaptations of Swap, i.e., a 
local minimum in N2-opt g{T) is not necessarily a local minimum in Nswap l{L) even if the problem is 
planar with Euclidean distances. 

Proof. We will show an example of a GTSP tour T which is a local minimum in A^2-opt g{T) but not 
a local minimum in A^swap h(T). Consider an example on Figure [5j It is a planar GTSP with Euclidean 
distances and 8 clusters: {1}, {2}, {3,3'}, {4}, {5}, {6}, {7,7'} and {8}. The original tour T is T = 
(1, 2, 7', 4, 5, 6,3', 8, 1). Observe that swapping 3' and 7' together with optimizing the swapped vertices 
(i.e., replacing 3' and 7' with 3 and 7, respectively) produces the optimal tour (1,2,3,4,5,6,7,8,1). At 
the same time, no adaptation of 2-opt is able to improve T because whatever is the vertex selection, any 
2-opt move will yield a tour with two intersecting (and, hence, long) edges. D 

4. Fragment Optimization 

All the adaptations of the TSP local searches discussed in Section [3] are intended to improve the whole 
tour structure. In this section we discuss local improvements. In other words, the neighborhoods below 
consist of the tours that can be obtained from the original one by altering only a small fragment of it. 
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Figure 5: An example of a local minimuin in iV2-opt ciT) which is not a local minimum in -/Vgwap l(T). 




(^1,^2) 



(r!i,f]3) (f^2,rji) 



(f}2,rj3) (^3,rii) 



(^3,^2) 



Figure 6: An example of a search tree of the J^i algorithm for k = 3 



One can think of many kinds of fragment optimization, but we focus only on the most powerful 
option, i.e., a neighborhood containing all possible rearrangements in a fragment of some fixed length k. 
Consider a tour T = (Ti, T2, . . . , T^, Ti). Let a = T^, b = T^+i, ^i = Cluster[Ti) for i = 1, 2, . . . , fc and 
ri = {J7i, 5I2, . . . , ilfe}- Let FO{a, b, H.) be the set of all paths from the vertex a to the vertex b through 
all the clusters in Q being taken in an arbitrary order. Note that \FO{a, b,fl)\ G ©(fcls*^)!!) 

Using the routine for finding the shortest paths in a layered network (see Section [2]), one can find the 
best path among FO{a, b, fl) in 0{kl ■ (fc — l)s^) operations. In this paper, we propose two algorithms Ti 
and J-2 that find the best path in FO{a, b, fl) in 0{s^kl) and 0(s^fc^2'') time, respectively. 

The J^i algorithm is a branch and bound algorithm. Let S{v) be a sequence of distinct clusters 
selected from ft assigned to search tree node v. Then S{p) = {S{v)i,S{v)2, ■ ■ ■ , 5'(t')|s(„)|_i) if p is the 
parent node of v. Set S{root) = 0. For an example, see Figure [B] 

Let C = S{v) and {2:i,a;2, . . . ^Xc] = C\c\ be the last cluster in C . For i — 1, 2, . . . ,c, let l{v)i be the 
weight of the shortest path from a to Xi through Ci, C2, . . . , Cipi^i. For i — 1, 2, . . . , c, let l{v)i = w{a, Xi) 
if \C\ ~ 1. Otherwise, if p is the parent node of w, P = S{p), {j/i, j/27 ■ ■ • , 2/c'} = ^|p| and we know l{p)j 



for every j = 1,2,..., c', let Z(w)i 



ij=i,2,...,c' ^'(p)j +w{yj,Xi) for every i = l,2,...,c. If \C\ 



i.e., f is a tree leaf, we also calculate the shortest path from a to 5 as follows: mini=i^2,...,c l{v)i -\-w{xi, b). 

The search tree contains X]i"=o (k^-iv. < ^' ' ^ nodes. It takes 0{s'^) to calculate the weights l{v) for a 
node V. Hence, the time complexity of J^i is 0(s^fc!). 

We can improve the performance of J-i by calculating the lower bound at every node v. Let ^min = 
niini=i_2,...,c^('y)i- Let A = To.\n.x,Y^R,x^Y WmmiXjY)^ where R = Q. \ P yj {Cluster{b)} . Then, if 
^min + A(|r2| — \P\) > Wbest, whcrc Wbcst IS the weight of the shortest (a, &)-path found so far, the node v 
and its branch are discarded. 



The second algorithm T2 is preferable for large values of k. It is a dynamic programming algo rithm 
that combines the idea of the Held and Karp's TSP algorithm ( Papadimitriou and Steiglita . ll998l ) with 



^The two algorithms below show that the Fragment Optimization problem is fixed-parameter tractable with respect to the 
parameter k. From the theoretical point of view, the second algorithm is more efficient than the first one, but experiments 
described later on show that for ver y small values of k the first algorithm i s actu ally faster. For more information on 
fixed-parameter tractability see, e.g., llDownev and Fellows! . [l999l : lNiedermeien . l2006f) . 
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finding the shortest path in a layered network. Let A C fi be a subset of the given clusters. We wish 
to find the shortest path p'^ from a to every x ^ Ufi eA ^« ^^^ ^^^ ^^^ clusters A taken in an arbitrary 
order. Observe that p'^ — w{a,x). Assume that, for every y G A, we know the shortest paths py 
from a to every y (z Y through clusters A \ {Y}. Then 



p^ = min min < n„ ^"^ ^ + w(y, x) > . 
^ YeAyeY I y J 



YeAye 

Hence, having the required information, one can find the shortest path from a to a; via clusters A taken 
in an arbitrary order in 0(|A|s) time. Observe that for A = f7 and x ~ b the algorithm finds the shortest 
path from a to 5 via all the clusters in the fragment. 

There are (i^,) possible subsets of clusters A of a given size and for every subset there are 0((fc— |A|)s) 
vertices x. It takes 0(|A|s) operations to find each of these shortest paths. Thus, the whole procedure 
takes 

O I Y^ (^ M . (fc - |A|)s • |A|s I = 0(s2fc22'=) operations. 

Hence, for small values of fc, the first algorithm Ti is preferable while the second algorithm T2 is 
faster for large fragments. 

The Nk-Fo(T) neighborhood includes all the tours that can be obtained from T by reordering any k 
consequent vertices and, maybe, replacing these vertices with some other vertices from the corresponding 
clusters. Let $*^(T) be a set of all tours that can be obtained from T by rearranging and 'reselecting' 
vertices Ti+i, Ti+2, ■ ■ ■ , Ti+k within the corresponding cluster. Then Nk-Fo{T) = Ul^i ^f C^); ^^d to 
explore this neighborhood we can run either the Fi or T2 algorithm m times. Observe that |$*^(r) n 
^^{T)\ ;3> 1 for some i and j and, hence, our algorithm explores some of the candidates in Nk-Fo{T) 
more than once. It is a natural question if avoiding multiple evaluations of these candidates can save any 
noticeable time. 

Let Af (T) = {T' e $f (T) : T^^^ ^ T^+i}. We assume that k < to/2. Then observe that A\{T) n 
A^fc(T) = for any i ^ j. Indeed, if some T' e 4*(T) n A^*(T) then I^^^ ^ T^+i and Tj^^ ^ T^+i. Since 
T' G A^j{T) and the vertex T-^-^ is modified, we get j <i + l < j + k. At the same time, since T' G A^{T) 
and the vertex T'^j^^ is modified, i<j + l<j + fc. This is only possible \ii — j. 

Observe that 

m ra 

(j$^fc(T)c{r}uUAf(r). 

i=l 1=1 

Indeed, if T' G $f (T) for some i, then either T' = T or there exists i < j <i + k such that Tj 7^ Tj and 
Tp = Tp for every p = i + 1, i + 2, . . . , j - 1. In the latter case T' G A'^(T). At the same time. 



{T}uUAf(T)c|J$f(r) 

i=l i=l 

since A^{T) C ^^^(T) and T G $,*(T) for any i. Hence, 

m m 

{T} U U A^T) = U <^Ut) = iVfc-Fo(r) . 

Recanthat4^(T)nAj^(r) = and observe that |Af(r)| = O {{ks - 1) s''-^{k - 1)1). Hence, |iVfc-Fo(r)| = 
O {miks - l)s''-^{k - ly.) . 

Compare it to 0{ms''kl), which is the number of candidates considered by to, runs of either J^i or J^2- 
The difference is only in 6( ^^f j^ ) times. We conclude that this relatively small overhead is not worth 
further complication of the algorithm. 

Let FOk be a local search with the Nk-Fo{T) neighborhood. Then, depending on the implementation, 
its time complexity is either 0{mkls^) or 0{mk^2^s^). 
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Table 5: FO implementations comparison. The reported values are running times, in ms. 
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Although we know that J^i is more efficient for small values of k and vice versa, empirical evaluation 
is required in order to find which algorithm is more efficient for particular values of k. We compare these 
implementations in Table [Sj From there we see that the first implementation is faster for fc < 6 while 
for fc > 6 the second implementation is preferable, and this result holds for all the instances. For fc = 6 
both implementations perform similarly but the second one is slightly faster on average. Hence, in what 
follows, we use J-i when fc < 5 and J- 2 otherwise. 

In Table |6] we provide results of experimental evaluation for the FO algorithm. It is predictable that 

Table 6: FO performance for different values of fc. 
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the heuristic yields very good solutions for small instances, i.e., when fc is close to m. On average, 
however, solution quality of FO is relatively low. We conclude that FO neighborhood is more interest- 
ing in combination with some other neighborhoods than as a stand-alone heuristic. Combining several 
neighborhoods, however, is a subject of a separate research. 

5. Data Structures 

Apart from the theoretical properties of an algorithm, implementation details may also have great 
influence on its performance. In this section, we discuss what data structures are the most efficient and 
convenient for a GTSP heuristic. 



5.1. Tour Representation 

It is a non-trivial question how one should store a GTSP solution. 



The most common approac h 
is to store a sequence of vertices in the visiting order. It was used by ISilberholz and GoldenI ( 2007 ): 



25 



Tasgetiren et alj ( 20101) and many others. The advantages of this method are simpUcity, compactness (it 



requires only one integer array of size m) and quickness of weight calculation. The disadvantages are 
difhculty in some tour modifications (observe that an Ins move takes 0{m) operations) and absence of 
a trivial tour correctness test. In addition, sliding along a tour in this representation requires additional 
measures to process a tour as a cycle, not as a finite seque nce. ^_^ 

Another tour representation, random- key, was used bv ISnvder and DaskinI ( 20061 ). It represents the 



tour as a sequence of real numbers (a;i,X2, . . . ,a;m); the zth number Xi corresponds to the ith cluster 
Ci of the problem. The integer part \xi\ of the number is the vertex index within the cluster Ci and 
the fractional part Xi — \xi\ determines the position of the cluster in the tour — the clusters are ordered 
according to these fractional parts, in ascending order. The main advantage of random-key tours is that 
almost any sequence of numbers represent a correct tour; one only needs to ensure that 1 < \xi\ < \Ci\ 
for every i. It is also relatively easy to implement some modifications of the tour. The disadvantages are 
difficulty in sliding along the tour and a high cost of the tour weighing. 

We propose a new tour representation which is based on double-linked lists. We store three integer 
arrays of size m: prev, next and vertices, where preWj is a cluster preceding cluster Ci in the tour, nexti 
is a cluster succeeding cluster Ci in the tour, and vertices i is a vertex within cluster Ci. There are several 
important advantages of this representation. Unlike other approaches, it naturally represents the cycle 
which simplifies the algorithms. Consider, e.g., a typical local search implementation ( Algorithm (TS]): the 

Algorithm 15 Typical implementation of a local search with a double-linked list based tour represen- 
tation. The algorithm performs as few iterations as possible to ensure that the tour is a local minimum. 

Initialize current cluster index z ■<— 1. 
Initialize counter i <— m. 
while i > do 

if there exist some improvements for the current cluster d then 
Update the tour accordingly. 
Update the counter t -^ m. 
else 

Decrease the counter t ^— t — 1. 
Move to the next cluster i 4— nexti. 

algorithm smoothly slides along the tour until no improvement is found for exactly one loop. Observe 
that one does not need the concept of position when using this tour representation; it is possible to use 
cluster index instead. In this context, the procedure of tour rotation becomes meaningless; one can simply 
consider any cluster as the first cluster in the tour. Moreover, it allows one to find a certain cluster in 
0(1) time; we use it, e.g., to start the CO calculations from the smallest cluster with no extra effort. 

Our representation clearly splits the cluster order and the vertex selection; note that some algorithms 
do not require the information on the vertex selection while some others do not modify the cluster order. 
It is useful that linked lists allow quick removing and inserting of elements. Moreover, to turn the tour 
backwards, one only needs to swap the arrays prev and next. Observe that this tour representation 
is deterministic, i.e., each GTSP tour has exactly one representation in this form. If the problem is 
symmetric, every tour {prev, next, vertices) has exactly one clone {next, prev, vertices). 

The main disadvantage of this representation is that it takes three times more space than the sequence 
of vertices. In practice, however, many algorithms do not require backward links so one can avoid using 
the prev array and reduce the memory usage to two m-elements arrays. When necessary, there is an 
efficient procedure to restore the prev array according to next^ 



Note that a similar tour representation was used by iTasgetiren et al.l ()2007l ). 



5.2. Weight Matrix Representation 

Another important decision is how to store the weights in a GTSP instance. There are two obvious 
solutions of this problem: 

1. Store a two dimensional matrix M of size n x n as follows: Mij = w{Vi, Vj). Note that this data 
structure stores X^I^i l^'iP redundant weights. 
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2. Store m{m — 1) matrices, one matrix M-^'^ of size \X\ x |y| per every pair of distinct clusters X 
and Y. 

If we have a pair of vertices and need to find the weight between them, it is obviously better to use the 
first approach. However, if we need to use many weights between two clusters (consider, e.g., calculation 
of the smallest weight between clusters X and Y: Wmin{X, Y)), the second approach is preferable. Indeed, 
in the first approach we have to look for the absolute index of every vertex in X and Y . In the second 
approach, we just use the entries of the matrix M^'^. Observe also that the second approach provides a 
sequential access to the weight matrix which is friendly with respect to computer architecture and, hence, 
faster. 

Our experimental analysis shows that the second approach improves the performance of CO approx- 
imately twice. However, it is not efficient, e.g., for the Basic adaptations (see Section 13. 2|) . In our 
implementations, we store the weights in both forms. 

6. Conclusion 

Three classes of GTSP neighborhoods are selected and discussed in this study. The most interesting 
neighborhood in the first class is Cluster Optimization. Having nice theoretical properties, it can be 
explored very quickly which makes the CO algorithm an essential subroutine in many heuristics. Thus, 
the performance of CO is of great importance. We introduce several improvements to the algorithm and 
prove that our implementation almost reaches the best performance possible for this neighborhood. 

The TSP-inspired neighborhoods is a large class of neighborhoods derived from TSP neighborhoods. 
We formalize the procedure of adaptation of a TSP neighborhood for the GTSP. Among other results, 
by proposing several new approaches, we significantly speed up, both theoretically and in practice, ex- 
ploration of the most powerful, 'Global', adaptation making it practically useful. This is particularly 
interesting since Global adaptation is well-known from the literature and was used or considered many 
times. This indicates that there is still great room for further improvements of local search algorithms 
for GTSP and other fundamental problems. 

The neighborhoods of the Fragment Optimization class were not widely used before, probably because 
of their relatively poor performance. In this study, we propose an efficient exploration algorithm for the 
largest neighborhood of this class. However, this algorithm is not intended to be used as a stand-alone 
local search. We believe that it can be very effective as a part of a more sophisticated heuristic. 

Further research is required to study possible combinations of GTSP local searches. We also believe 
that one can significantly improve the performance of GTSP metaheuristics by using several results of 
this paper. 
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